OzoneXpect App

Akash Mer

2023-07-29

Introduction

OzoneXpect aims to predict mean Ozone levels in parts per billion(ppb) based on the measurement taken or the information known by the user.
The user is allowed to input the following information :

  1. Measurement taken/information known to the user - Represented as a predictor in the app, user can choose from the following,
    • Temperature(default)
    • Wind speed
    • Solar Radiation
  2. Current Month - Month the above measurement was taken in. Also gives the option to not specify the month. User can choose from May to September or Any Month(default)
  3. Measurement Value - User can then enter the measurement value to be used for the prediction.

Data Used - The data for the app comes from the airquality data set in R datasets package which is as follows,

str(airquality)
'data.frame':   153 obs. of  6 variables:
 $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
 $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
 $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
 $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
 $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
 $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...

27.45% of rows contain missing values. impute.knn() function with k = 10 from the impute package was used to impute these missing values

How does it predict?

OzoneXpect predicts using 2 different models,

  1. Linear Model - A linear model is built using the desired predictor as the outcome and using the data from the airquality data set. The data is subsetted in case a particular month was selected to ensure stratification and avoid any confounding due to the month variable. This model is built using the lm() function in R.
  2. Loess Model - A non-linear model is built under the same circumstances as above. This model is built using the loess() function in R with a span of 0.7

Then, the mean ozone level is predicted using both models and returned with a 95% prediction interval for both models

Outputs

1. Relationship Plot

Salient Features :

2. Predictions

Salient Features :

Example Prediction
Mean Ozone Level(ppb) 95% Prediction Interval
lower limit upper limit
Linear Model
1 41.06939 1 -2.581093 1 84.71988
Loess Model
32.99557 30.053930 35.93720
Reference: airquality data set from R datasets package
1 These negative values are not helpful

Strengths

  1. The user is afforded a lot of input options like deciding the predictor, informing the app about the current month, and thus is able to get mean ozone levels depending on a variety of conditions.
  2. An interactive plot is displayed which updates depending on the user’s input
  3. Both linear and non-linear predictions are returned, thus allowing the user to choose whatever predictions they want
  4. User is informed about incorrect predictions as well
  5. Predictions are coupled with 95% prediction intervals thus providing the probability statistic behind the uncertainty in the predictions

Limitations and Plans to tackle the limitations

  1. Small data set - In search for larger data sets representing similar variables
  2. Only 6/12 months included - In search for data sets containing measurements for around the year
  3. Data was not divided into train/test sets due to the already small sample size - Prediction intervals and the associated probability of error is provided though